重建索引
1> 重建对象对齐索引
df_1 = pd.DataFrame(np.random.randn(5,3), columns=['col1','col2','col3'])
df_2 = pd.DataFrame(np.random.randn(3,3), columns=['col1','col2','col3'])
print(f'初始数组:\n{df_1}')
# 输出结果:
# 初始数组:
# col1 col2 col3
# 0 -2.566516 0.460331 -0.686840
# 1 -1.429907 -0.603902 2.349435
# 2 -0.878803 -0.023454 -1.715970
# 3 -1.902423 0.074308 -2.150251
# 4 0.041810 -1.143817 -1.676994
df_1 = df_1.reindex_like(df_2)
print(f'重建对象对齐索引后的数组:\n{df_1}')
# 输出结果:
# 重建对象对齐索引后的数组:
# col1 col2 col3
# 0 -2.566516 0.460331 -0.686840
# 1 -1.429907 -0.603902 2.349435
# 2 -0.878803 -0.023454 -1.715970
2> 填充时重新加注
填充方法:
| 参数 | 说明 |
|---|---|
| pad/ffill | 向前填充值 |
| bfill/backfill | 向后填充值 |
| nearest | 从最近的索引值填充 |
df_1 = pd.DataFrame(np.random.randn(5,3), columns=['col1','col2','col3'])
df_2 = pd.DataFrame(np.random.randn(3,3), columns=['col1','col2','col3'])
print(f'初始数组:\n{df_1}')
# 输出结果:
# 初始数组:
# col1 col2 col3
# 0 -0.682017 -1.829388 -1.995502
# 1 -1.337590 -0.582744 0.737791
# 2 -0.920093 -0.797116 -0.627841
# 3 -0.775704 0.569862 2.184947
# 4 -0.806158 0.886839 -1.225187
print(f'用 NaN 填充:\n{df_2.reindex_like(df_1)}')
# 输出结果:
# 用 NaN 填充:
# col1 col2 col3
# 0 -1.362653 2.202365 1.440829
# 1 -2.137158 -0.251861 -1.530036
# 2 -0.030374 0.018452 1.533934
# 3 NaN NaN NaN
# 4 NaN NaN NaN
print(f'用前面的值填充:\n{df_2.reindex_like(df_1,method = "ffill")}')
# 输出结果:
# 用前面的值填充:
# col1 col2 col3
# 0 -1.362653 2.202365 1.440829
# 1 -2.137158 -0.251861 -1.530036
# 2 -0.030374 0.018452 1.533934
# 3 -0.030374 0.018452 1.533934
# 4 -0.030374 0.018452 1.533934
3> 重建索引时的填充限制
df_1 = pd.DataFrame(np.random.randn(5,3), columns=['col1','col2','col3'])
df_2 = pd.DataFrame(np.random.randn(3,3), columns=['col1','col2','col3'])
print(f'初始数组:\n{df_1}')
# 输出结果:
# 初始数组:
# col1 col2 col3
# 0 -0.682017 -1.829388 -1.995502
# 1 -1.337590 -0.582744 0.737791
# 2 -0.920093 -0.797116 -0.627841
# 3 -0.775704 0.569862 2.184947
# 4 -0.806158 0.886839 -1.225187
print(f'用 NaN 填充:\n{df_2.reindex_like(df_1)}')
# 输出结果:
# 用 NaN 填充:
# col1 col2 col3
# 0 -1.362653 2.202365 1.440829
# 1 -2.137158 -0.251861 -1.530036
# 2 -0.030374 0.018452 1.533934
# 3 NaN NaN NaN
# 4 NaN NaN NaN
print(f'用前面的值填充:\n{df_2.reindex_like(df_1,method = "ffill",limit = 1)}')
# 输出结果:
# 用前面的值填充:
# col1 col2 col3
# 0 -1.362653 2.202365 1.440829
# 1 -2.137158 -0.251861 -1.530036
# 2 -0.030374 0.018452 1.533934
# 3 -0.030374 0.018452 1.533934
# 4 NaN NaN NaN
4> 重命名
df_1 = pd.DataFrame(np.random.randn(5,3), columns=['col1','col2','col3'])
print(f'初始数组:\n{df_1}')
# 输出结果:
# 初始数组:
# col1 col2 col3
# 0 0.552349 0.573317 0.736866
# 1 0.354193 -3.032830 0.794273
# 2 1.930062 -0.761772 0.242552
# 3 1.278869 -1.219668 0.405299
# 4 0.368458 0.730534 -0.29185
print(f'重命名行和列:\n{df_1.rename(columns = {"col1":"col1","col2":"cl2","col3":"cl33"},index = {0:"apple",1:"banna"})}')
# 输出结果:
# 重命名行和列:
# col1 cl2 cl33
# apple 0.552349 0.573317 0.736866
# banna 0.354193 -3.032830 0.794273
# 2 1.930062 -0.761772 0.242552
# 3 1.278869 -1.219668 0.405299
# 4 0.368458 0.730534 -0.291857
5> 重设索引
设置新的下标索引
语法:
对象.reset_index(drop = False)
| 参数 | 说明 |
|---|---|
| drop | 默认为 False,不删除原来索引,如果为 True,删除原来的索引 |
# 重置索引,drop=False
data.reset_index()

6> 以某列值设置为新的索引
语法:
对象.set_index(keys, drop=True)
| 参数 | 说明 |
|---|---|
| keys | 列索引名或者列索引名称的列表 |
| drop | 默认为 False,不删除原来索引,如果为 True,删除原来的索引 |